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This study tests the hypothesis of a gender 
difference in academic achievement as a function of measurement 
method. The biasing influence of measurement method on achievement 
has been recognized. Campbell and Fiske (1959) suggested that a 
considerable proportion of the variation in test scores may be due to 
features of the form of 'test (method) us<r*d rather than the individual 
characteristics (traits) which the test is designed to measure. Using 
a sample of 15-16-year-old Irish school students, gender differences 
in achievement were examined for three school subjects (Irish, 
English, and Mathematics), assessed by two methods (multiple choice 
test, and written public examination). As has been found in other 
countries, males performed significantly better than females on 
multiple choice tests, compared with performance on written 
examinations. An additional hypothesis that the gender difference 
would be largest for the languages and smallest for mathematics was 
not supported, This finding runs contrary to an explanation of this 
phenomenon in terms of greater verbal skills of females. Alternative 
explanations are proposed and educational policy implications are 
discussed. (Author/DWH) 
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ABSTRACT 

i f— — mm m _ » • 

This study tests the hypothesis of a. gender difference in academic 
achievement ajs a function of measurement method . Using a sample of 
15-16 year old Irish school students (Nsl665), gender differences in 
achievement were examined for three school subjects (Irish, English* 
Mathematics), assessed by two methods (multiple-choice test, written 
public examination). As has been found in other countries, males 
performed significantly better than females on multiple-choice tests, 
compared with written examinations. An additional „ hypothesis that the 
gendc- difference would be largest for the languages and smallest for 
mathematics was not supported. This finding runs contrary tq.an 
explanation of this phenomenon in terms of greater verbal skills of 
females. Alternative explanations are proposed and educational policy 

■ 

implications are discussed. 
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Gender difference in academic achievement 

.« ** 

.according to method of measurement 

... \ 
The biasing influence of measurement method on research findings has 

_ .... " " (S 

long been recognised, and method ologists have urged that multiple 
meat-ires of constructs be obtained in order to counteract this 
potential problem (Campbell and Fiske, 1959; Cook and Campbell, 1979). 
This issue has acquired relevance in the- educational literature due to 
the finding that males perform relatively better than females on 
multiple-choice tests compared with written examinations (Dwyer, 1979; 

<* - . 

Murphy, 1982). Moreover, quasi-experimental effects on public 
examination scores, have been noted in the United Kingdom, following a 
change from the use of written questions only to a combination of 
written and multiple- choice questions. Murphy (1980) provides time- 
series evidence that following the introduction of a multiple-choice 
paper into a 1977 public examination in. Geography, "the percentage of 
male candidates obtaining A, B, or G grades became' approxiamately 10* 
higher than- the equivalent figure for the female candidates". 

We regard these findings as exemplifying the method-trait 
distinction proposed, by Campbell and Fiske (1959). In ^heir now-classic 
paper, the authors suggest that a y considerable proportion of the 
variation in test -scores may be due to features of the form of test 
(method) used rather than the individual -character ictios (traits) which 
the test is designed to measure. This issue of method variance is 
central to the present problem, for it appears that substantial gender 

differences in achievement are attributable to differences in the 

t ■ ■ 

method by which achievement is . measured. 



The main explanation offered for these results has been that they 
reflect a gender difference in verbal ability such that females posess 
a relative advantage where a written, assessment method is used. 
Evidence for this proposition is provided by Murphy (1982 ) , who found 
consistent gender differences in achievement according to measurement 
method (multiple-choice, written test) for a wide range of subjects 
excepting mathematics. Given the low verbal content required in 
mathematics examinations, whether written or multiple-choice, this 
finding supports the verbal hypothesis. 

The aims of. the present paper are two fold. First, I will test the 
'-cross-cultural geneAalizabil ity of this finding by determining whether 

a sex difference in achievement, as a function of measurement method, 
* exists in t;he Irish esse. To do so, I will utilise informatics^ - - 
collected on a cohort of pupils whose academic performance was assessed 
over a number of years in the 1970s. I will focus on three second- 
level school subjects (English, Irish, and Mathematics' or which 
scores on standardised multiple-choice tests and grades on the 
Intermediate Certificate public examination (taken mid-way through the 
. high school years) are available. 

Second , as only one study to date has attempted to explain these 
findings (e.g., Murphy, 1982 ), I will test the adequacy of tha verbal/ 
hypothesis as an explanation of the gender difference (if found) using 
a more appropriate statistical model (repeated measures A NOVA vs. a 
sequence of t- tests) . Consistent with a concern with the issues 

w 

identified by Campbell and Fiske (1959), throughout the remainder of 
the paper I wil^ utilise their terminology by referring to achievement 
in Irish, English and "Mathematics as traits, and by referring to the 



types of measurement used, i.e., mult i pie- oho ice tests and written- 
public examinations, as methods. 

. . METHOD ' •• 
Sample ■ ■ 

A random sample of 70 high schools, stratified of fche basis of 
gender composition, type, and size, was obtained as part of a large- 
scale longitudinal study of the effects of standardized testing in the 
Irish context. The total sample amounted to seventy schools. The 
subjects in the present study were a. cohort of 15-16 year old students 
who completed multiple-choice tests of Irish, English, and Mathematics 
in Fall 1975 and sat the state-wjLde public examinations in June 1976. 
Complete data were available for 1,565 pupils (773 males and 792 
females) . 



Measures r 

* * ■ 

Multiple choice test scores were obtained using level VI, Form A, of 
the Drumcomdra series of tests in Irish, English and Mathematics 
(Educational Research Centre, 1978). These tests, developed shortly 
before the present data were collected, were designed tp assess 
performance at ages 15-16,. when the public examination is usually 
taken. Public examination results in the three subjects were also 
obtained . 

7 

A difficulty exists in relation to the public examination results in 
■ that separate higher level and lower level papers were available in 
each subject. For purposes of comparison with the multiple- choice test 
scores, it is necessary to° equate higher and lower paper grades. The 



method employed in the present study utilises Martin and 0*Ro urke* s 
transformation (in press). Achievement in each subject was expressed 
as an el even- print .scale. The scores t>n the higher and lower papers 
(in each subject) were mapped onto this scale in the following way: 
Higher paper, A = 1 1, BdO, Cs*8, Ds?, E=4, Fs2, No grade=0; Lower paper, 
A=9, B=6, C*5,"Ds3, Ea1, FsQ, Sip grades 0, 

Design and analysis 

For data analytic purposes, a model is required which allows the 
assessment of gender differences in achievement attributable to both 
method of assessment and trait assessed. A mixe -model repeated 
measures analysis of variance ( ANOVA ) approach, which treats eech of 
the six attainment variables as repeated measures of a single variak^e, 
scholastic achievement, is used (Searle, 1971; Winer , . 1971 ). These 
repeated measures, representing the factors Trait and Method, form the 
wi thin-subjects design of the ANOVA. The between- sub jects design, 
consists of a single factor, Gender v ^Jtote that the fixed components of 
the m^xed-model are Gender, Trait, and Method, whereas the subjects 
themselves are treated as a random sample. 

Confirmation of the hypothesis that "males perform better than 
females on the multiple- choice measures requires that a significant 
Gender x Method interaction exists. Furthermore, a Gender x Trait 
interaction is also expected, whereby males, regardless of method of 
measurement, perform better than females in Mathematics (Maccopty and, 
Jacklin, 197^). Finally, the hypothesis that the method gender- 
difference is largest for the languages and weakest for Mathematics 



presumes a significant three-way interaction. 

........... t % ■ , ' 

RESULTS 

Raw scores 6 

Mean scores obtained by sales and females on the six measures of 
achievement are presented in Table 1. The group means on multiple* 
choice Irish are almost identical; a small difference in favour of v ~ 
males exists in the case of multiple-choice English (0.6) units; while 
a substantial male advantage is evident in multiple- choice Mathematics 
(9.2 units). The scaled Intermediate Certificate Examination results 
show a somewhat different picture with the females performing no tic ably 
better than males* in Irish (0.4 units) and slightly ^better in English 
(0.2 units). In Mathematics,' once again males show a substantial 
advantage (0.6 units). * 

0 v . 



* INSERT TABLE 1 ABOUT HERE 

Scaled scores * . 

Since the above measures are on different scales, .\t it not possible 
from Table. 1 to obtain a clear •indication of gender differences between 
sets of measures, e.g.,,, between .methods or traits. 

To do so, it_J^s necesary to express alL.measures on a common scale. In 
this case percentage scores have been used, I.e., in each* ease MOO is 
the aaximum scom attainable. s Figure 2 presents these data in the farm 
of the percentage difference between the groups (female score minus 
male score) on each measure. 



INSERT FIGURE 1 ABOUT HERE 



Two patterns are discernable in Figure 1. First, there is evidence 
of gender differencesYt^the level, of trait regardless of method, of N 
measurement. This is most clear in Mathematics, wher^e males.per.fdrm 
substantially better than females. The findings for the languages ar,e 
less clear , and suggest that method does make a difference In Irish 
for instance, the multiple-choice measure indicates, no gender 
difference, whereas the Intermediate Certificate measure indicates that 
females score 3.5% better than males on average. • 

The second pattern concerns the consistent way in which the 
multiple-choice measures, compared to written examinations, favour ~~ 
males, and conversely, the consistent female advantage attributable to 
Intermediate Certificate measures, relative to multiple-choice. This 
effect (indicated by arrows^ can be* seen by comparing the' height's of 
the method columns for each subject^ In Irish, the male advantage 
associated wit.i method of measur em en ^multiple- choice vs. written 
examination) is 2.8 percentage points. For English, the male advantage 
is 2.2 percentage points and for Mathematics, this rises to 3.9/ 
percent. { , 

• - * 

Multivariate Analysis . % ' 

Table 2 provides- results of the repeated measures ANOVA. The " A 

dependent measure was expressed ir. the form of standard deviate^ z) 
scores. In this way, the main effects of Trait, MetJioiiand the Trait x 



Method interaction are let to zero » as these are not of interest here* 

«* * . 

The hypothesis of a substantial Gerider x Methd"d interaction is 
supported and represents an effect size of. approximately 3% of the 
range of eqch trait, i.e., on average, females spore three percent 
higher relative to males, when traits are measured by written rather 
than mul t i pie- choice methods < and vice versa for males) . Consistent ^ 
with the second hypothesis, a large Gender x Trait interaction is also 
evident. Contrary to expectation, the three-way interaction, whereby 
the Method oy Gender effect varies by Trait, is not ^significant . "•' 

■ * • . DISCUSSION . . 

This paper tests for tjie presence of a^ gender difference in academic 
achievement according to whether multiple- choice or written, examination^ 
methods are used. The findings, based on a sample of 15*16 year old 
- Irish -students* _are that males perform relatively better than females 
on multiple-choice forms of assessment, compared with written 
examinations, and vice versa for females. Furthermore, we find that 
this effect of measurement method is constant across the traits 
measured (achievement in Irish, English, and Mathematics). Thus, the' 
data indicate the existence of a method-based gender difference, but 
suggest that this is not attributable to differential verbal skill 
requirements between the measurement methods used. 

* 

Given this lack of support for the verbal hypothesis, several other 
possibilities warrant consideration, both singly or in combination.- 
-First, greater neatness of presentation may explain *the"*superior 
performance of females in .written examinations. Previous research 
indicates that neatness has a significant, effect on achieved scores 



• . . • 1 

(e.g., Briggs", ^980) and is consistent with the .finding of a constant" 
gender difference across traits; Second, males may have a greater 
.tendency to guess the answers to multiple- choice questions than 
females, and thus be-more likely, on average, to obtain the correct 
answer (the subjects in this study were not informed of the advantages^ 
of intelligent guessing in mul ti pie- choice tests) . These hypotheses 
have not, to date, been examined. 



In the face of these alternative explanations, and given the lack of 
confirmation ofjtarphy's (1982 ) results, we see the next step in this 
line of research -as taking the form of a study which examines the 
various hypotheses simultaneously. A useful endeavor of this kind 
would involve the design of mul ti pie- choice and written forms pf an 

* * * 

examination incorporating identical content, while simultaneously using 
neatness of scripts (as assessed by raters) and the proportion of non- 
responses (indexing differences in tendency to guess) as nonr 
experimental Independent variables; « ." 

A separate issue from that of explaining these findings relates to 

the educational policy of widespread use o; multiple-choice tests in 

many countries, Assuming the above findings to be correct, tjien it is,. 

likely that the introduction of multiple- choice tests, where 'this 

occurs, will tend to improve* the performance of 'males relative to 

females '(a sj3 ho wn in the U.K. by Murphy, 1980). A further implication 

of these findings is that the introduction of multiple- choice tests 

will result in changes in the pattern of sex differences such that they 

will increase in mathematical subjects,' i .e ., the males advantage in 

this domain will increase, and decrease in verbal subjects, i.e., the 

femal.e advantage in this domain will decrease. \ 4 

fl 1 1 
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. v Ii^ conclusion, the findings -reported here are indicative of an . 
effect of method of measurement on the relative performance Of males.* 
and females at high school level In 'Ireland. This .effect, 'which 
confirms the findings of researchers elsewhere, warrants further 
explanatory, research and , in« addition, the attention /of .educational 

* .' ■ !..*.',''..' ' 1 ' . * *? " "'. • 

po 1 id y makers with regard to the assessment methods. used in ^pwbi.^*}.''/^'^. 
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TABLE 1 

Means and standard deviations for multiple-choice and 

written measures of achievement in Irish , English , 
and "Mathematics ; Irish high school students *( N=1565 ) . 



MULTIPLE -CHOICE 



WRITTEN EXAM 



* 



Irish English Maths 



Irish English Maths 



MALES 



Mean 
SD 



42.0 
16. 1 



53.3 
15. 1 



61.7 
17.4 



2.9 
2.0 



3.9 3.5 



1.6 



1.9 



FEMALES Mean 

»-' 

SD 



42.2 
15.6 



52.7 
13.9 



52.5 
16.5 



3.3 
2.0 



4. 1 
1.6 



2.9 
1.9 
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* ' TABLE 2 

* Repeated measures ANOVA of academic achievement ( z-score a ) 
by Trait , Method ( within- subjects ) an"d Gender ( between-subject s) 
^ for a sample of Irish high school students (N= 1565 ) . 



> 

•SOURCE 


SS - 


df 


MS 


F 




) 

P 
















TOTAL 


9362.74 


9389 










BETWEEN -SUBJECTS 


6591-57 


1564 




<* 


* * 




Gender 


2 1 . 90 


1 


21.90 


5. 


21 


<.05 


Error 


6569.67 


1563 


4.20 








within- rrs 


2791 . 17 


7825' 










ait 


0*00 


2 


o.oc 


0. 


00 


N.S. 


Gender x ..-ait 


122.59 


2 


61.2"9 


131. 


84 


<.0C' 


Error 


1453.. 11 


3126 


0.46 








Method 


0.00 


1 


o. oe 


0. 


00 


N.S. 


Gender x Method 


« 18.2C 


1 


18.20 


44, 


76 


<,001 


E-ror 


633. 3C 


1563, 


0.41 








Tvait x M thod 


0. tit 


2 


r >. GO 


0. 


00 


N.S. 


x Trait x Method 


0.44 


2 


0.22 


i . 


23 


N.S. 


Error 


.558.33 


3126 


0. 15 
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